Byte-index Chunking Algorithm for Data Deduplication System

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Leap-based Content Defined Chunking - Theory and Implementation

Content Defined Chunking (CDC) is an important component in data deduplication, which affects both the deduplication ratio as well as deduplication performance. The sliding-window-based CDC algorithm and its variants have been the most popular CDC algorithms for the last 15 years. However, their performance is limited in certain application scenarios since they have to slide byte by byte. The a...

متن کامل

FastCDC: a Fast and Efficient Content-Defined Chunking Approach for Data Deduplication

Content-Defined Chunking (CDC) has been playing a key role in data deduplication systems in the past 15 years or so due to its high redundancy detection ability. However, existing CDC-based approaches introduce heavy CPU overhead because they declare the chunk cutpoints by computing and judging the rolling hashes of the data stream byte by byte. In this paper, we propose FastCDC, a Fast and eff...

متن کامل

Two-Level Metadata Management for Data Deduplication System

Data deduplication is an essential solution to reduce storage space requirement. Especially chunking based data deduplication is very effective for backup workloads which tend to be files that evolve slowly, mainly through small changes and additions. In this paper, we introduce a novel data deduplication scheme which can be efficiently used with low bandwidth network in a rapid time. The key p...

متن کامل

DDSF: A Data Deduplication System Framework for Cloud Environments

Cloud storage has been widely used because it can provide seemingly unlimited storage space and flexible access way, while the rising cost of storage and communications is an issue. In this paper, we propose a Data Deduplication System Framework(DDSF) for cloud storage environments. The DDSF consists of three major components, the client, fingerprint server and storage component. The client com...

متن کامل

Distributed Data Deduplication

Data deduplication refers to the process of identifying tuples in a relation that refer to the same real world entity. The complexity of the problem is inherently quadratic with respect to the number of tuples, since a similarity value must be computed for every pair of tuples. To avoid comparing tuple pairs that are obviously non-duplicates, blocking techniques are used to divide the tuples in...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal of Security and Its Applications

سال: 2013

ISSN: 1738-9976,1738-9976

DOI: 10.14257/ijsia.2013.7.5.38